fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking by ygree · Pull Request #10644 · DataDog/dd-trace-java

ygree · 2026-02-19T21:45:14Z

What Does This Do

Aligns OpenAI Java LLMObs span payloads with expected intake/system-test schema by:

Adding/filling missing LLMObs tags:
- _ml_obs_tag.integration
- _ml_obs_tag.source
- _ml_obs_tag.ddtrace.version
- _ml_obs_tag.error
- _ml_obs_tag.error_type
Ensuring model_name (and stable placeholder output where applicable) is set on error paths for
chat/completions/embeddings/responses.
Expanding Responses instrumentation:
- prompt tracking (input.prompt, variables, chat_template)
- tool definition extraction (tool_definitions)
- tool call/result extraction across function/custom/MCP outputs
- metadata normalization (stream, tool_choice, text.verbosity, etc.)
Updating LLMObs mapper payload shape:
- writes _dd map with span/trace ids
- nests error fields under meta.error
- supports map-based LLM input serialization (messages + prompt)
- remaps tool_definitions into meta.

Motivation

OpenAI/LLMObs system tests exposed schema and tag mismatches in Java payloads (especially response spans, tool metadata, error mapping, and prompt tracking structure). This change brings Java output in line with expected LLMObs intake contract and behavior.

Additional Notes

openai-java-3.0 min version updated from 3.0.0 to 3.0.1.
ResponseTextConfig fun verbosity(): Optional<Verbosity> was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

DataDog/dd-apm-test-agent#280
DataDog/system-tests#6364

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, migration, or deletion
Update public documentation with any new configuration flags or behaviors

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

pr-commenter · 2026-02-19T22:33:17Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1773939812	1774929373
git_commit_sha	`5580c61`	`d7d4866`
release_version	1.61.0-SNAPSHOT~5580c61ac4	1.60.0-SNAPSHOT~d7d4866358

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1774931196	1774931196
ci_job_id	1553021997	1553021997
ci_pipeline_id	105184639	105184639
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-iclaexns 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-iclaexns 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics.

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.066 s) : 0, 1065982
Total [baseline] (11.034 s) : 0, 11034499
Agent [candidate] (1.067 s) : 0, 1067232
Total [candidate] (11.157 s) : 0, 11157281
section appsec
Agent [baseline] (1.251 s) : 0, 1251474
Total [baseline] (11.214 s) : 0, 11213666
Agent [candidate] (1.252 s) : 0, 1251814
Total [candidate] (11.214 s) : 0, 11214115
section iast
Agent [baseline] (1.236 s) : 0, 1235512
Total [baseline] (11.358 s) : 0, 11357729
Agent [candidate] (1.233 s) : 0, 1232603
Total [candidate] (11.389 s) : 0, 11388723
section profiling
Agent [baseline] (1.188 s) : 0, 1188429
Total [baseline] (11.125 s) : 0, 11125225
Agent [candidate] (1.188 s) : 0, 1187616
Total [candidate] (11.08 s) : 0, 11080254

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.066 s	-
Agent	appsec	1.251 s	185.492 ms (17.4%)
Agent	iast	1.236 s	169.53 ms (15.9%)
Agent	profiling	1.188 s	122.447 ms (11.5%)
Total	tracing	11.034 s	-
Total	appsec	11.214 s	179.167 ms (1.6%)
Total	iast	11.358 s	323.231 ms (2.9%)
Total	profiling	11.125 s	90.726 ms (0.8%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.067 s	-
Agent	appsec	1.252 s	184.582 ms (17.3%)
Agent	iast	1.233 s	165.371 ms (15.5%)
Agent	profiling	1.188 s	120.384 ms (11.3%)
Total	tracing	11.157 s	-
Total	appsec	11.214 s	56.833 ms (0.5%)
Total	iast	11.389 s	231.442 ms (2.1%)
Total	profiling	11.08 s	-77.027 ms (-0.7%)

gantt
    title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.205 ms) : 0, 1205
BytebuddyAgent [baseline] (633.286 ms) : 0, 633286
BytebuddyAgent [candidate] (633.527 ms) : 0, 633527
AgentMeter [baseline] (29.666 ms) : 0, 29666
AgentMeter [candidate] (29.55 ms) : 0, 29550
GlobalTracer [baseline] (259.607 ms) : 0, 259607
GlobalTracer [candidate] (260.293 ms) : 0, 260293
AppSec [baseline] (32.08 ms) : 0, 32080
AppSec [candidate] (31.964 ms) : 0, 31964
Debugger [baseline] (60.91 ms) : 0, 60910
Debugger [candidate] (60.797 ms) : 0, 60797
Remote Config [baseline] (594.453 µs) : 0, 594
Remote Config [candidate] (600.111 µs) : 0, 600
Telemetry [baseline] (8.094 ms) : 0, 8094
Telemetry [candidate] (8.795 ms) : 0, 8795
Flare Poller [baseline] (4.409 ms) : 0, 4409
Flare Poller [candidate] (4.337 ms) : 0, 4337
section appsec
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (661.863 ms) : 0, 661863
BytebuddyAgent [candidate] (660.807 ms) : 0, 660807
AgentMeter [baseline] (12.205 ms) : 0, 12205
AgentMeter [candidate] (12.134 ms) : 0, 12134
GlobalTracer [baseline] (259.041 ms) : 0, 259041
GlobalTracer [candidate] (258.918 ms) : 0, 258918
AppSec [baseline] (177.815 ms) : 0, 177815
AppSec [candidate] (178.969 ms) : 0, 178969
Debugger [baseline] (66.048 ms) : 0, 66048
Debugger [candidate] (65.561 ms) : 0, 65561
Remote Config [baseline] (652.511 µs) : 0, 653
Remote Config [candidate] (660.975 µs) : 0, 661
Telemetry [baseline] (8.295 ms) : 0, 8295
Telemetry [candidate] (8.349 ms) : 0, 8349
Flare Poller [baseline] (3.551 ms) : 0, 3551
Flare Poller [candidate] (4.42 ms) : 0, 4420
IAST [baseline] (24.237 ms) : 0, 24237
IAST [candidate] (24.303 ms) : 0, 24303
section iast
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (800.913 ms) : 0, 800913
BytebuddyAgent [candidate] (799.254 ms) : 0, 799254
AgentMeter [baseline] (11.578 ms) : 0, 11578
AgentMeter [candidate] (11.427 ms) : 0, 11427
GlobalTracer [baseline] (249.318 ms) : 0, 249318
GlobalTracer [candidate] (248.472 ms) : 0, 248472
AppSec [baseline] (26.646 ms) : 0, 26646
AppSec [candidate] (26.474 ms) : 0, 26474
Debugger [baseline] (70.282 ms) : 0, 70282
Debugger [candidate] (70.291 ms) : 0, 70291
Remote Config [baseline] (539.482 µs) : 0, 539
Remote Config [candidate] (541.683 µs) : 0, 542
Telemetry [baseline] (9.774 ms) : 0, 9774
Telemetry [candidate] (9.723 ms) : 0, 9723
Flare Poller [baseline] (3.563 ms) : 0, 3563
Flare Poller [candidate] (3.573 ms) : 0, 3573
IAST [baseline] (25.426 ms) : 0, 25426
IAST [candidate] (25.441 ms) : 0, 25441
section profiling
ProfilingAgent [baseline] (94.44 ms) : 0, 94440
ProfilingAgent [candidate] (93.874 ms) : 0, 93874
crashtracking [baseline] (1.183 ms) : 0, 1183
crashtracking [candidate] (1.181 ms) : 0, 1181
BytebuddyAgent [baseline] (685.267 ms) : 0, 685267
BytebuddyAgent [candidate] (685.71 ms) : 0, 685710
AgentMeter [baseline] (9.024 ms) : 0, 9024
AgentMeter [candidate] (9.029 ms) : 0, 9029
GlobalTracer [baseline] (216.145 ms) : 0, 216145
GlobalTracer [candidate] (216.332 ms) : 0, 216332
AppSec [baseline] (32.523 ms) : 0, 32523
AppSec [candidate] (32.374 ms) : 0, 32374
Debugger [baseline] (65.841 ms) : 0, 65841
Debugger [candidate] (66.089 ms) : 0, 66089
Remote Config [baseline] (567.356 µs) : 0, 567
Remote Config [candidate] (565.814 µs) : 0, 566
Telemetry [baseline] (7.807 ms) : 0, 7807
Telemetry [candidate] (7.693 ms) : 0, 7693
Flare Poller [baseline] (4.331 ms) : 0, 4331
Flare Poller [candidate] (3.479 ms) : 0, 3479
Profiling [baseline] (95.016 ms) : 0, 95016
Profiling [candidate] (94.434 ms) : 0, 94434

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1060219
Total [baseline] (8.829 s) : 0, 8829299
Agent [candidate] (1.058 s) : 0, 1058299
Total [candidate] (8.844 s) : 0, 8843544
section iast
Agent [baseline] (1.232 s) : 0, 1232137
Total [baseline] (9.579 s) : 0, 9578833
Agent [candidate] (1.231 s) : 0, 1231212
Total [candidate] (9.55 s) : 0, 9549685

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.06 s	-
Agent	iast	1.232 s	171.918 ms (16.2%)
Total	tracing	8.829 s	-
Total	iast	9.579 s	749.534 ms (8.5%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	iast	1.231 s	172.913 ms (16.3%)
Total	tracing	8.844 s	-
Total	iast	9.55 s	706.141 ms (8.0%)

gantt
    title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (631.03 ms) : 0, 631030
BytebuddyAgent [candidate] (630.07 ms) : 0, 630070
AgentMeter [baseline] (29.378 ms) : 0, 29378
AgentMeter [candidate] (29.539 ms) : 0, 29539
GlobalTracer [baseline] (257.528 ms) : 0, 257528
GlobalTracer [candidate] (257.604 ms) : 0, 257604
AppSec [baseline] (32.081 ms) : 0, 32081
AppSec [candidate] (31.867 ms) : 0, 31867
Debugger [baseline] (59.916 ms) : 0, 59916
Debugger [candidate] (59.734 ms) : 0, 59734
Remote Config [baseline] (590.982 µs) : 0, 591
Remote Config [candidate] (585.146 µs) : 0, 585
Telemetry [baseline] (8.775 ms) : 0, 8775
Telemetry [candidate] (8.03 ms) : 0, 8030
Flare Poller [baseline] (3.565 ms) : 0, 3565
Flare Poller [candidate] (3.516 ms) : 0, 3516
section iast
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (799.787 ms) : 0, 799787
BytebuddyAgent [candidate] (799.511 ms) : 0, 799511
AgentMeter [baseline] (11.373 ms) : 0, 11373
AgentMeter [candidate] (11.405 ms) : 0, 11405
GlobalTracer [baseline] (248.079 ms) : 0, 248079
GlobalTracer [candidate] (248.357 ms) : 0, 248357
AppSec [baseline] (26.661 ms) : 0, 26661
AppSec [candidate] (26.539 ms) : 0, 26539
Debugger [baseline] (67.347 ms) : 0, 67347
Debugger [candidate] (68.264 ms) : 0, 68264
Remote Config [baseline] (536.997 µs) : 0, 537
Remote Config [candidate] (528.589 µs) : 0, 529
Telemetry [baseline] (11.364 ms) : 0, 11364
Telemetry [candidate] (10.096 ms) : 0, 10096
Flare Poller [baseline] (3.857 ms) : 0, 3857
Flare Poller [candidate] (3.649 ms) : 0, 3649
IAST [baseline] (25.45 ms) : 0, 25450
IAST [candidate] (25.466 ms) : 0, 25466

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1773939812	1774929373
git_commit_sha	`5580c61`	`d7d4866`
release_version	1.61.0-SNAPSHOT~5580c61ac4	1.60.0-SNAPSHOT~d7d4866358

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1774931590	1774931590
ci_job_id	1553021998	1553021998
ci_pipeline_id	105184639	105184639
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-cddcgd5l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-cddcgd5l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 20 metrics, 16 unstable metrics.

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.234 ms) : 1222, 1246
.   : milestone, 1234,
iast (3.242 ms) : 3199, 3284
.   : milestone, 3242,
iast_FULL (5.863 ms) : 5805, 5921
.   : milestone, 5863,
iast_GLOBAL (3.561 ms) : 3500, 3622
.   : milestone, 3561,
profiling (2.159 ms) : 2139, 2180
.   : milestone, 2159,
tracing (1.872 ms) : 1856, 1887
.   : milestone, 1872,
section candidate
no_agent (1.217 ms) : 1207, 1228
.   : milestone, 1217,
iast (3.361 ms) : 3317, 3406
.   : milestone, 3361,
iast_FULL (5.886 ms) : 5828, 5945
.   : milestone, 5886,
iast_GLOBAL (3.599 ms) : 3544, 3654
.   : milestone, 3599,
profiling (2.094 ms) : 2076, 2112
.   : milestone, 2094,
tracing (1.856 ms) : 1840, 1873
.   : milestone, 1856,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.234 ms [1.222 ms, 1.246 ms]	-
iast	3.242 ms [3.199 ms, 3.284 ms]	2.007 ms (162.7%)
iast_FULL	5.863 ms [5.805 ms, 5.921 ms]	4.629 ms (375.1%)
iast_GLOBAL	3.561 ms [3.5 ms, 3.622 ms]	2.327 ms (188.5%)
profiling	2.159 ms [2.139 ms, 2.18 ms]	925.242 µs (75.0%)
tracing	1.872 ms [1.856 ms, 1.887 ms]	637.583 µs (51.7%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.217 ms [1.207 ms, 1.228 ms]	-
iast	3.361 ms [3.317 ms, 3.406 ms]	2.144 ms (176.1%)
iast_FULL	5.886 ms [5.828 ms, 5.945 ms]	4.669 ms (383.5%)
iast_GLOBAL	3.599 ms [3.544 ms, 3.654 ms]	2.382 ms (195.6%)
profiling	2.094 ms [2.076 ms, 2.112 ms]	876.495 µs (72.0%)
tracing	1.856 ms [1.84 ms, 1.873 ms]	638.782 µs (52.5%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (19.196 ms) : 19001, 19391
.   : milestone, 19196,
appsec (18.655 ms) : 18466, 18845
.   : milestone, 18655,
code_origins (17.903 ms) : 17726, 18081
.   : milestone, 17903,
iast (18.397 ms) : 18211, 18584
.   : milestone, 18397,
profiling (19.219 ms) : 19028, 19410
.   : milestone, 19219,
tracing (18.04 ms) : 17863, 18218
.   : milestone, 18040,
section candidate
no_agent (19.31 ms) : 19115, 19506
.   : milestone, 19310,
appsec (18.903 ms) : 18708, 19098
.   : milestone, 18903,
code_origins (18.019 ms) : 17841, 18197
.   : milestone, 18019,
iast (19.245 ms) : 19054, 19437
.   : milestone, 19245,
profiling (18.781 ms) : 18593, 18969
.   : milestone, 18781,
tracing (18.845 ms) : 18656, 19034
.   : milestone, 18845,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	19.196 ms [19.001 ms, 19.391 ms]	-
appsec	18.655 ms [18.466 ms, 18.845 ms]	-540.582 µs (-2.8%)
code_origins	17.903 ms [17.726 ms, 18.081 ms]	-1.293 ms (-6.7%)
iast	18.397 ms [18.211 ms, 18.584 ms]	-798.791 µs (-4.2%)
profiling	19.219 ms [19.028 ms, 19.41 ms]	23.11 µs (0.1%)
tracing	18.04 ms [17.863 ms, 18.218 ms]	-1.156 ms (-6.0%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	19.31 ms [19.115 ms, 19.506 ms]	-
appsec	18.903 ms [18.708 ms, 19.098 ms]	-406.694 µs (-2.1%)
code_origins	18.019 ms [17.841 ms, 18.197 ms]	-1.291 ms (-6.7%)
iast	19.245 ms [19.054 ms, 19.437 ms]	-64.667 µs (-0.3%)
profiling	18.781 ms [18.593 ms, 18.969 ms]	-528.883 µs (-2.7%)
tracing	18.845 ms [18.656 ms, 19.034 ms]	-465.417 µs (-2.4%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1773939812	1774929373
git_commit_sha	`5580c61`	`d7d4866`
release_version	1.61.0-SNAPSHOT~5580c61ac4	1.60.0-SNAPSHOT~d7d4866358

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1774931359	1774931359
ci_job_id	1553021999	1553021999
ci_pipeline_id	105184639	105184639
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-rhiksb9l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-rhiksb9l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.478 ms) : 1467, 1490
.   : milestone, 1478,
appsec (3.775 ms) : 3554, 3996
.   : milestone, 3775,
iast (2.26 ms) : 2191, 2329
.   : milestone, 2260,
iast_GLOBAL (2.306 ms) : 2236, 2376
.   : milestone, 2306,
profiling (2.081 ms) : 2026, 2135
.   : milestone, 2081,
tracing (2.088 ms) : 2034, 2142
.   : milestone, 2088,
section candidate
no_agent (1.484 ms) : 1473, 1496
.   : milestone, 1484,
appsec (3.803 ms) : 3581, 4024
.   : milestone, 3803,
iast (2.258 ms) : 2189, 2327
.   : milestone, 2258,
iast_GLOBAL (2.307 ms) : 2237, 2376
.   : milestone, 2307,
profiling (2.101 ms) : 2046, 2157
.   : milestone, 2101,
tracing (2.068 ms) : 2014, 2121
.   : milestone, 2068,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.478 ms [1.467 ms, 1.49 ms]	-
appsec	3.775 ms [3.554 ms, 3.996 ms]	2.296 ms (155.4%)
iast	2.26 ms [2.191 ms, 2.329 ms]	781.831 µs (52.9%)
iast_GLOBAL	2.306 ms [2.236 ms, 2.376 ms]	827.764 µs (56.0%)
profiling	2.081 ms [2.026 ms, 2.135 ms]	602.774 µs (40.8%)
tracing	2.088 ms [2.034 ms, 2.142 ms]	609.68 µs (41.2%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.484 ms [1.473 ms, 1.496 ms]	-
appsec	3.803 ms [3.581 ms, 4.024 ms]	2.318 ms (156.2%)
iast	2.258 ms [2.189 ms, 2.327 ms]	773.954 µs (52.1%)
iast_GLOBAL	2.307 ms [2.237 ms, 2.376 ms]	822.457 µs (55.4%)
profiling	2.101 ms [2.046 ms, 2.157 ms]	617.153 µs (41.6%)
tracing	2.068 ms [2.014 ms, 2.121 ms]	583.474 µs (39.3%)

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.88 s) : 14880000, 14880000
.   : milestone, 14880000,
appsec (14.608 s) : 14608000, 14608000
.   : milestone, 14608000,
iast (18.083 s) : 18083000, 18083000
.   : milestone, 18083000,
iast_GLOBAL (18.14 s) : 18140000, 18140000
.   : milestone, 18140000,
profiling (15.717 s) : 15717000, 15717000
.   : milestone, 15717000,
tracing (14.988 s) : 14988000, 14988000
.   : milestone, 14988000,
section candidate
no_agent (15.407 s) : 15407000, 15407000
.   : milestone, 15407000,
appsec (14.593 s) : 14593000, 14593000
.   : milestone, 14593000,
iast (18.251 s) : 18251000, 18251000
.   : milestone, 18251000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
.   : milestone, 17785000,
profiling (15.464 s) : 15464000, 15464000
.   : milestone, 15464000,
tracing (15.094 s) : 15094000, 15094000
.   : milestone, 15094000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.88 s [14.88 s, 14.88 s]	-
appsec	14.608 s [14.608 s, 14.608 s]	-272.0 ms (-1.8%)
iast	18.083 s [18.083 s, 18.083 s]	3.203 s (21.5%)
iast_GLOBAL	18.14 s [18.14 s, 18.14 s]	3.26 s (21.9%)
profiling	15.717 s [15.717 s, 15.717 s]	837.0 ms (5.6%)
tracing	14.988 s [14.988 s, 14.988 s]	108.0 ms (0.7%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.407 s [15.407 s, 15.407 s]	-
appsec	14.593 s [14.593 s, 14.593 s]	-814.0 ms (-5.3%)
iast	18.251 s [18.251 s, 18.251 s]	2.844 s (18.5%)
iast_GLOBAL	17.785 s [17.785 s, 17.785 s]	2.378 s (15.4%)
profiling	15.464 s [15.464 s, 15.464 s]	57.0 ms (0.4%)
tracing	15.094 s [15.094 s, 15.094 s]	-313.0 ms (-2.0%)

…wthTestOpenAiLlmInteractions::test_completion

…teractions::test_chat_completion_tool_call

…d with python openai instrumentation and system-tests

… with variables + chat_template, longest-first overlap handling) and support map-based LLM input serialization (messages + prompt) in LLMObs mapper. Also filter empty instruction messages to match system-test expectations.

…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.

…output.messages from request params so existing error-span tests pass.

…ol_definitions tags

…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.

Kyle-Verhoog

LLMObs Team Review

Nice work aligning the Java SDK payloads with the intake schema — this is a big step for system test compliance. A few items to address/clarify below (inline), plus some overall notes:

Test Coverage Notes

What's well-covered: LLMObsSpanMapperTest expansion is great — covers _dd map, nested meta.error, map-based input with prompt/chat_template, tool definitions, tool calls + tool results. The decorator tests verify the new tags (source, integration, error, ddtrace.version).

Gaps to consider:

Error paths: No test exercises the error-path defaults (model_name and empty output set during withResponseCreateParams when the HTTP call fails). A test where the response errors out and verifying the span still has model_name and placeholder output would be valuable.
Prompt tracking: enrichInputWithPromptTracking(), extractChatTemplate(), extractPromptFromParams(), and normalizePromptVariable() have no unit tests. Template variable replacement edge cases (overlapping values, empty variables, image/file fallbacks) would increase confidence.
Custom/MCP tool calls: ToolCallExtractor.getToolCall(ResponseCustomToolCall) and getToolCall(McpCall) are new with no unit tests.
JsonValueUtils: New utility class with no dedicated tests for recursive JSON-to-Object conversion.

Questions

The min version bump from 3.0.0 to 3.0.1 — what API was missing in 3.0.0? This affects which customer versions get instrumented.
For the _dd map — does the intake expect apm_trace_id to equal trace_id? In other SDKs these can differ (APM trace ID vs LLMObs ID).

.github/workflows/run-system-tests.yaml

dd-trace-core/src/main/java/datadog/trace/llmobs/writer/ddintake/LLMObsSpanMapper.java

Kyle-Verhoog · 2026-03-22T04:34:50Z

dd-trace-core/src/main/java/datadog/trace/llmobs/writer/ddintake/LLMObsSpanMapper.java

-
-      boolean errored = span.getError() == 1;
+      writable.writeUTF8(STATUS);
+      writable.writeString(span.getError() == 0 ? "ok" : "error", null);


The top-level error: 0/1 integer field has been removed and replaced with status: "ok"/"error" + error details nested under meta.error. Can you confirm no downstream consumers (EvP remapper, indexer facets, etc.) read error from the top level? This is a payload shape change that could be breaking if anything depends on the old field.

This change is dictated by the TestOpenAiLlmInteractions::test_chat_completion assertion. I assume that the system test assertions are correct. Have they been verified as being compliant with the requirements of downstream consumers?

If I leave the top-level error field, the system test will fail.

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java

dd-trace-core/src/main/java/datadog/trace/llmobs/writer/ddintake/LLMObsSpanMapper.java

ygree · 2026-03-24T19:53:07Z

dd-java-agent/instrumentation/openai-java/openai-java-3.0/build.gradle

 apply from: "$rootDir/gradle/java.gradle"

-def minVer = '3.0.0'
+def minVer = '3.0.1'


ResponseTextConfig fun verbosity(): Optional<Verbosity> was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

ygree · 2026-03-24T22:25:05Z

Questions

The min version bump from 3.0.0 to 3.0.1 — what API was missing in 3.0.0? This affects which customer versions get instrumented.

ResponseTextConfig fun verbosity(): Optional was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

For the _dd map — does the intake expect apm_trace_id to equal trace_id? In other SDKs these can differ (APM trace ID vs LLMObs ID).

This is aligned with dd-trace-py https://github.com/DataDog/dd-trace-py/blob/876c5f1ce4d173815537798a6a7b0ac15b0a4ede/ddtrace/llmobs/_llmobs.py#L618-L622.

…and placeholder output set by withResponseCreateParams.

…f enrichInputWithPromptTracking(), extractChatTemplate(), extractPromptFromParams(), and normalizePromptVariable()

…of getToolCall

… format. Test cover extractPromptFromParams and related methods

amarziali

apm-java has just the TagAssert file concerned. So overall delegating to llmops / idm the review

ygree · 2026-04-01T16:55:16Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-04-01T16:55:21Z

View all feedbacks in Devflow UI.

2026-04-01 16:55:21 UTC ℹ️ Start processing command /merge

2026-04-01 16:55:26 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 2h (p90).

2026-04-01 17:52:55 UTC ℹ️ MergeQueue: This merge request was merged

ygree self-assigned this Feb 19, 2026

ygree added comp: mlobs ML Observability (LLMObs) type: bug Bug report and fix labels Feb 19, 2026

llmobs: set model tag even when llmobs disabled

cbd6226

ygree force-pushed the ygree/llmobs-systest-fixes branch from 5cd257e to cbd6226 Compare February 24, 2026 09:31

ygree changed the title ~~llmobs: set model tag even when llmobs disabled~~ fix(llmobs): set model tag even when llmobs disabled Mar 2, 2026

ygree added 23 commits March 2, 2026 13:30

Set metadata.stream tag no matter it's true or false

4f27673

Set chat/completion CACHE_READ_INPUT_TOKENS tag

d128d6b

Set error nad error_type tags

3fc5ceb

Use "" instead of null for the role in CompletionDecorator to comply …

021a9d1

…wthTestOpenAiLlmInteractions::test_completion

Use "" instead of null for the content to comply with TestOpenAiLlmIn…

0637931

…teractions::test_chat_completion_tool_call

Add missing metatadata.tool_choice

0cb41e1

Add missing tool_definitions

a42f8aa

Add source:integration tag

6e10255

Add missing _dd attribute to the llmobs span event

34f3a07

Add missing error tags

a0c1139

Remove error from the llmobs span event. It must be part of meta block

effc343

Add missing meta.text.verbosity

c0e3876

Add summaryText and encrypted_content

b000770

Add missing tool_calls and tool_results for responses

53471a2

Always set stream param to produce the same request body to be aligne…

2207c46

…d with python openai instrumentation and system-tests

Fix OpenAI Responses prompt tracking to use response instructions fir…

7d683b6

…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.

Set LLMObs error-path defaults in Java to always emit model_name and …

2c17ddc

…output.messages from request params so existing error-span tests pass.

Add OpenAI Responses tool definition extraction to populate LLMObs to…

ad3b782

…ol_definitions tags

Fix ChatCompletionServiceTest

1810327

Extract JsonValueUtils

46221e4

Refactor OpenAI responses instrumentation to reuse ToolCallExtractor …

61ad667

…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.

Fix test assertions

f0957b7

Kyle-Verhoog reviewed Mar 22, 2026

View reviewed changes

ygree force-pushed the ygree/llmobs-systest-fixes branch from 6dcdaf4 to 717a8f0 Compare March 24, 2026 18:43

ygree commented Mar 24, 2026

View reviewed changes

Catch exception not throwable

8420f0a

ygree added 7 commits March 24, 2026 16:00

Add JsonValueUtilsTest

91707fa

Test that on HTTP error, the OpenAI response span retains model_name …

3d12515

…and placeholder output set by withResponseCreateParams.

Add "create response with prompt tracking" test to improve coverage o…

576cec7

…f enrichInputWithPromptTracking(), extractChatTemplate(), extractPromptFromParams(), and normalizePromptVariable()

Add "create response with custom tool call" test to improve coverage …

ba0cb27

…of getToolCall

Prevent NPE when tag value is null

8be92d7

Replace catch Throwable with catch Exception

1036ed4

responseCreateParamsWithPromptTracking support both known and unknown…

9911c51

… format. Test cover extractPromptFromParams and related methods

ygree requested a review from a team as a code owner March 26, 2026 04:08

ygree added this to the 1.61.0 milestone Mar 26, 2026

ygree requested a review from Kyle-Verhoog March 26, 2026 18:13

ygree added 2 commits March 27, 2026 12:07

Use safe accessors throughout OpenAI Java decorators

c8f1a02

Simplify OpenAI decorator fallbacks without dropping raw JSON support

61de1c0

amarziali reviewed Mar 30, 2026

View reviewed changes

clean up dead code

1b6615a

Kyle-Verhoog approved these changes Mar 31, 2026

View reviewed changes

ygree added 6 commits March 30, 2026 20:56

improve coverage for chat/completion test with raw tool definition

d7d4866

improve coverage for extractFunctionToolDefinition

173db8a

improve coverage for extractFunctionToolDefinition

c25c3e9

improve coverage for mcp tool call

9e1afcf

improve coverage for tool choice

a17b7cd

Fix latestDepTest

408d335

gh-worker-dd-mergequeue-cf854d bot merged commit e307e2c into master Apr 1, 2026
566 checks passed

gh-worker-dd-mergequeue-cf854d bot deleted the ygree/llmobs-systest-fixes branch April 1, 2026 17:52

Conversation

ygree commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Contributor Checklist

Uh oh!

pr-commenter bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

Kyle-Verhoog left a comment

Choose a reason for hiding this comment

LLMObs Team Review

Test Coverage Notes

Questions

Uh oh!

Uh oh!

Uh oh!

Kyle-Verhoog Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

ygree Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ygree Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ygree Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ygree commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions

Uh oh!

amarziali left a comment

Choose a reason for hiding this comment

Uh oh!

ygree commented Apr 1, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ygree commented Feb 19, 2026 •

edited

Loading

pr-commenter bot commented Feb 19, 2026 •

edited

Loading

ygree Mar 24, 2026 •

edited

Loading

ygree commented Mar 24, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 bot commented Apr 1, 2026 •

edited

Loading